-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Conversation
20a35d5
to
668520f
Compare
476af06
to
c2f18e8
Compare
FYI, I changed the merge base branch to the Don't forget to change it back to |
internal/db/dbutil/dbutil.go
Outdated
return nil, errors.New("expected migration filename to be of the form <version>_<name>") | ||
} | ||
version := versionParts[0] | ||
newContents := strings.Replace(string(oldContents), "COMMIT;", fmt.Sprintf("UPDATE schema_migrations SET version='%s', dirty=false;\nCOMMIT;", version), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect golang-migrate to already update the version in the database. If that premise is valid, why do we need to re-update it? Would it be enough to just set dirty=false (and avoid all the version parsing code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be enough to just set dirty=false (and avoid all the version parsing code)?
Oh right, yes, golang-migrate sets the version so we can drop the version parsing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you can't!
If you set dirty=false but not the version, the frontend will not be around to update the version (necessarily) so you'll have applied the migration but the version will be the old one. This will cause the next frontend attempt to fail (forever).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efritz golang-migrate sets the version and dirty=true prior to the migration.
https://github.com/golang-migrate/migrate/blob/v4.8.0/migrate.go#L739-L754
// set version with dirty state
if err := m.databaseDrv.SetVersion(migr.TargetVersion, true); err != nil {
return err
}
if migr.Body != nil {
m.logVerbosePrintf("Read and execute %v\n", migr.LogString())
if err := m.databaseDrv.Run(migr.BufferedBody); err != nil {
return err
}
}
// set clean state
if err := m.databaseDrv.SetVersion(migr.TargetVersion, false); err != nil {
return err
}
I believe that means the frontend doesn't need to be around to update the version.
I won't merge this in until we reach consensus here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I thought it only set dirty=true first, then dirty=false, version=target at the end.
I'm convinced you can remove it now.
cd86045
to
d6e1c4b
Compare
Codecov Report
@@ Coverage Diff @@
## clean-up-migrate-db #7909 +/- ##
=======================================================
+ Coverage 40.57% 40.57% +<.01%
=======================================================
Files 1279 1279
Lines 66889 66896 +7
Branches 6320 6320
=======================================================
+ Hits 27139 27144 +5
- Misses 37277 37278 +1
- Partials 2473 2474 +1
|
@chrismwendt I think there's a problem in this solution as well. This could cause a lot of noise during long migrations that still result in a CrashLoopBackoff that might eventually resolve without user intervention. Like, it's a better state than we're in, but it's not nearly as good as what RFC 469's proposal guarantees (imo). |
@efritz It looks like golang-migrate acquires an advisory lock before checking the dirty state, so frontends won't see the dirty state: If it did it the other way around, then on upgrade all but then frontend that first began the migration would throw errors about a dirty DB (which doesn't happen). Although the frontend isn't around to explicitly release the lock, Postgres automatically releases it once the session ends (which I'm guessing is immediately after the SQL commands finish): I tried interrupting migrations in a few different ways to see the behavior: migration-interruptions.mp4It seems to me that |
tl;dr this makes migrations atomic, which ensures that successful migrations always leave the DB in a clean state.
Prior to this change, it was possible for the frontend to disconnect from Postgres halfway through a migration (e.g. when the migration takes >5 minutes and k8s kills the frontend because it's not responding), leaving the DB marked as being in a dirty state despite the migration eventually succeeding.
After this change, each migration will set
dirty=false
within the transaction to guarantee that when a migration succeeds the DB will be in a clean state.More context:
A few alternatives:
3 teammates supported workaround 3:
Upon further consideration, it became more clear to me that workaround 2 would be better:
This PR is based on https://github.com/sourcegraph/sourcegraph/pull/7905, which is expected to be merged before this PR.